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Abstract 

We introduce a model of proteins in which all of the key atoms in the protein backbone are 
accounted for, thus extending the Freely Rotating Chain model. We use average bond lengths and 
average angles from the Protein Databank as input parameters, leaving the number of residues as 
a single variable. The model is used to study the stretching of proteins in the entropic regime. The 
results of our Monte Carlo simulations are found to agree well with experimental data, suggesting 
that the force extension plot is universal and does not depend on the side chains or primary 
structure of proteins. 
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I. INTRODUCTION 



Accurate modeling of protein structure and dynamics is an immense challenge in the 
biological sciences, and has elicited intense interest among physicists. Many models have 
been developed, from first principle studies to coarsegrained phenomenological theories. 
A major issue is the tradeoff between inclusion of microscopic details, and computational 
tractability and efficiency. It is hoped that the latter can be improved by removing select 
microscopic details which do not affect the accuracy of the results. 

In this work we introduce a model of the protein backbone and apply it to the problem 
of protein stretching, for which many experimental results are available. This allow us to 
model the backbone of proteins accurately without paying attention to effective interactions 
or force-fields. 



A. Experiments 



We know the elasticproperty of single molecules by atomic force microscopy (AFM) 



and optical tweezers 
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One of the most studied proteins by these experimental 



ar modules (also called repeats or 
13] . A single folded module can 



methods is titin, which plays an important role in the elasticity of muscles 6fl. The interesting 
segment of titin is the I-band, which is made of simi 
domains) with an immunoglobulin-like (Ig) structure 
resist against pulling below a threshold. If the force exceeds the threshold, the domain will 
unfold. The unfolded domain will refold into its native state if the force is removed 

If we pull a chain of folded Ig domains slowly, the necessary force to keep the protein 
extended will increase until one of the modules unfolds. This is called an unfolding event, 
and after that we can keep the protein extended with a much weaker force. As we increase 
the extension, the modules unfold one by one until we get a fully extended chain. If we plot 
force versus extention during the stretching, we will see a saw-tooth pattern (Fig. EJ) 
The stretching of titin also has been studied by different theoretical methods like molecular 
d^c —a 03, lattice oaodels Q and the thick chain (TC) n.ode, Q. 
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B. Models 



Many theoretical models have been developed to study macromolecules and proteins. 
Among them, Freely Jointed Chain (FJC"), Freely Rotating Chain (FRC) and Worm Like 
Chain (WLC) are the most famous jl^. llll They are used intensively to model the 
backbone of proteins by considering the protein as a chain of only C a atoms, and usually 
using an effective interaction among C a atoms to recover the neglected details. We argue here 
that these models are not accurate enough to model protein backbone, and they usually have 
at least one free parameter in addition to the chain length N (for example, the persistence 
length) that should be calculated by fitting of theories to experiments. Instead of using such 
a parameter, we can simply use the well known geometric details of peptides. 

Worm Like Chain is used to study the elasticity of macromolecules |l4j . It is simple and 
works fine whenever we can apply continuum approximation. It is good especially for very 



large and stiff macromolecules like DNA 



15. 16]. whereas it becomes physically unjustifiable 



when applied to more coarse-grained and more flexible chains like proteins. For instance, in 
stretching of titin's Ig domain, the persistence length of the related WLC is ~ 0.4. Although 
this value is about the size of a single peptide unit l|, WLC is frequently used in literature 
to fit and explain experimental data because a better alternative had been absent. 

Freely Jointed Chain is a chain of monomers with fixed bond length In this model, 
bonds' angles can have any arbitrary value, which is not the case in real proteins jl7|. 
However, Freely Jointed Chain has an advantage over more detailed models as it has an 
exact solution in the case of stretching. 

Freely Rotating Chain has one more constraint than the FJC. The angles are rigid and 
do not change Q]. Freely Rotating Chain behaves more like real proteins compared to FJC. 
This model has been used to study different aspect of proteins and polymers including our 
problem of interest lg . However, it does not cover all the features of bond angles because 

n 

the angle between C a — C a bonds is not completely rigid in real proteins (table |TTJ) 17]. 

In our model, we consider carbon and nitrogen atoms on the backbone (C a , C and N) 
and we follow the Ramachandran picture. This can be considered as a fine-grained version 
of a FRC or as a coarse-grained version of a Four Bead model. Our model is the simplest 
model that includes all the geometric properties of the backbone of proteins, and it does not 
need any fitting parameters. 
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FIG. 1: Peptide units and their average dimensions (nanometer) and average angles, from Protein 
Data Bank (PDB). 



Ca Ca 




FIG. 2: (Color on-line) A Three Bead Rotating Chain. The rotations are done around C a — C and 
C a — N (bold black lines). We do not permit any rotation around C — N bonds (red lines in color 
prints or thin gray lines that connect smaller circles in black and white prints). 

In this work we are interested only in the geometric details rather than the effective inter- 
actions so we can keep the model simple and clear; therefore, we only study the mechanical 
properties of proteins in the entropic regime and compare the results with experiments. 

II. MODEL AND SIMULATION 

We use Freely Rotating Chain model with a small variation: some bonds can not be a 
pivot. If we show the chain as a binary sequence of and 1 in which 1 represents a bond 
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that can be a pivot for rotations and as a bond that can not be a base for rotations, then 
our chain will be presented as: 

[101]» (1) 

where [...]„ means n repeats of the argument. This is equivalent to considering C and N 
atoms in addition to C a atoms and using a Ramachandran picture so that the chain only 



rotates around C a — C and C a — N bonds 



3 (Fl 



g. |2J. Since the chain does not rotate 



around C — N bonds, we know the structure by only a pair of angles (0, ip) for each C, 



junction. These angles are known as Ramachandran angles 



3- 



This is the simplest model that covers Ramachandran angles which has more details than 
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FRC, and it is simpler than the Four Bead model 

To set distances and angles in our simulation, we analyzed the Protein Data Bank (PDB) 
and calculated the averages of desired quantities over all polypeptide chains in the database 
(72212 peptide units) . The results are shown in Fig. [T]and more details are in tables |U and 

mi 

The largest rigid distance over the backbone of proteins is the one between two sequential 
C a atoms; therefore, a good approximation is to consider a chain of C a atoms with fixed 
bond length. On the other hand, it is not as good to assume proteins act under the Freely 
Rotating Chain model, because the average of angle between C a — C a bond has a relatively 
large standard deviation and it is not completely rigid (table ITT]). 

III. RESULTS 

We performed equilibrium Monte Carlo simulation and we used the standard metropolis 



22 1 with Pivot move 



algorithm 



12 3| and a simple Hamiltonian of the form H = f.r, where / 
is the pulling force and r is the position of the end of the chain while the other end is fixed 
at the origin. In the experiments, proteins are pulled slowly so we could use equilibrium 
Monte Carlo. We used the original random number generator of GNU C/C++ compiler to 
perform the simulation. 

We started from a force of zero and an extended structure. We performed the simulation 
until reaching equilibrium, then we increased the force gradually. At each step we let the 
system reach the equilibrium, then we measured the average extension along the direction 



of the force. This process is equivalent to the AFM experiments with a soft cantilever 



24]. 
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Atom-Atom 


Distances (nm) 


a 




3.82 


0.05 


C a lN X 


2.43 


0.05 




2.40 


0.02 




1.53 


0.01 


C a \C\ 


1.53 


0.01 


CxNx 


1.33 


0.04 




1.47 


0.01 


Nid 


2.47 


0.04 


C1C2 


3.2 


0.2 



TABLE I: Atomic distances of the main atoms in peptides. Indexes show the order of atoms from 
left to right in the Fig. ^ 



A.B 


Dot products 


a 


Angle (Degree) 




0.19 


0.25 


78.9 


N\C a 2-C2Ca2 


0.349 


0.032 


69.60 



TABLE II: Angles in peptide units. The vectors are normalized and the indexes show the order 
of atoms from left to right in Fig. ^ You can see that FRC is not valid because the Standard 
Deviation of C a iC a 2.C a 2C a 3 is too large compare to its average. 

so we can use the experimental data of this Ref. Q]. 

We continued the process until we got a fully extended chain (~ 32 nm for 89 residues). 
We repeated it for chains with different numbers of residues. We found that the extension is 
proportional to the number of residues, independent of the force. In Fig. El one can see that 
all the extension-force curves collapse very well when their extensions have been divided by 
the number of residues (iV). Because of this universality, it is useful to find a suitable fitting 
function to represent results of simulation and reproduce them for later uses. The Logistic 
Dose-Response, 
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0,01 0,1 1 10 100 1000 

Force (pN) 

FIG. 3: Results of simulations for N = 25,50,100 and 200 peptides. All the curves have been 
divided by N. The curves collapse perfectly. The solid line shows Eq. EJand the related parameters 
are in table ITTT1 

aN 

is a good candidate. Although it is simple and has only three parameters, it includes all the 
important features of the curves: a parameter to control the transition height (aN), another 
one to control the transition center (/ c ) and one more to control the transition sharpness (7). 
In Eq. |2] variables x, f and A" are extension, force and the number of residues, respectively. 
The share of each peptide units in total length is a = 0.3640 nm and f c = 12.46 pN shows 
the critical force which separates the swollen and the extended phases. The last parameter 
is the fitting exponent 7 = 1.021 that controls the transition sharpness. If we want to fit 
the function to an experiment, we have to consider also the possible offset from zero. 

The function |21 holds for the stretching of a chain without self interaction, but we can use 
it to fit the saw-tooth pattern as well. To do this, we consider that the multi domain chain 
can have both the unfolded entropic chains and the folded domains. We know the behavior 
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Force (pN) 



FIG. 4: Reduced extension versus force. 



a 


0.3651 ± 0.0004 nm 


fc 


13.0 ±0.1 pN 


7 


1.072 ± 0.004 




5.4 ± 0.3 nm 



TABLE III: Parameters of Eq. |31 Where a, f c and 7 has been calculated by simulation, has 
been calculated by fitting the simulation results to that of saw-tooth like patterns. 

of the former from Eq. |21 and we can approximate the latter as a rigid and inextensible 
objects of an unknown size. We will find its size from the peak to peak distance of the saw- 
tooth patterns. In fact, the distance between two consecutive peaks in saw-tooth patterns is 
equal to the length of an unfolded domain minus its end-to-end distance when it is folded. 
The following 



X = l + (///)7 + ( Hl9 ~ + X ° ^ 

gives us the extension after the nth unfolding event. Here N is the number of residues 



S 



in a single domain, and ni g is the number of Ig repeats in the experiment. The parameter 
Rd shows the distance between the first residues of two consecutive folded domains which is 
equal to the end to end distance of a folded Ig domain plus the length of one peptide unit. 
The last parameter xq is the offset from zero in experiments. 

The first term in Eq. |3] is the extension of unfolded domains and the second term is 
the contribution of folded domains to the contour length. It should be mentioned that we 
have assumed that the folded domains are rigid objects which only increase the length of 
the chain and do not contribute to the entropy and the force; therefore, we expect that the 
simulations fit better to the last events, when most of the domains are unfolded. 

We know iV and we have found a and f c by simulation; therefore, only Rd remains 
unknown and can be used as a fitting parameter to set the peak to peak distance. We will 
have a good fit if we choose Rd equal to 5.4 nm; therefore, it gives us the end to end distance 
of a single folded domain that is approximately 5.0 nm. This value roughly agrees with the 
4.d size taken by NMrQ. This small discrepancy (0.7 nm, almost twice of a peptide unit) 
might be due to the structure deformation of domains under tension. 

Figure El displays the results of simulations and their comparisons to experimental. The 
experimental data has been taken directly from the graph of Ref. Q]. The parameters are 
displayed in table IIHI 

IV. CONCLUSION 

We used a model that has only one free parameter: the number of residues. By using this 
model, we could fit the result of the simulation to a single unfolding event accurately. This 
shows that the primary structure of the protein is not important in the entropic regime, 
within the experimental precision. 

Our results suggest that this model is accurate enough to study the protein backbones in 
the entropic regime. Simpler models will need to introduce fitting parameters to the model 
to reproduce the experimental data. 

Since our model performed well in describing the mechanical properties of protein back- 
bones in the entropic regime, one may go one step further and use it as a base for studies 
like [25] in which our model can reduce the number of effective interactions and present a 
clearer and more straightforward picture. Besides, it is easy to consider also the hydrogen 
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FIG. 5: We fitted the simulation (lines) to the experiment (circles) of |l( by following the concepts 
of the Eq. |HI Related fitting parameters are in table II I II 

bonds along the backbone, since it is sufficient to include oxygen and hydrogen atoms of the 
backbone. 
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